Cryptographic Pseudonym Mapping Method, Computer System, Computer Program And Computer-Readable Medium

ABSTRACT

The invention is a cryptographic pseudonym mapping method for an anonymous data sharing system, the method being adapted for generating a pseudonymized database (DB) from data relating to entities and originating from data sources (DS i ), wherein the data are identified at the data sources (DS i ) by entity identifiers (D) of the respective entities, and wherein the data are identified in the pseudonymized database KM (DB) by pseudonyms (P) assigned to the respective entity identifiers (D) applying a one-to-one mapping. According to the invention, more than one, a number k of mappers (M j ) are applied, and the respective pseudonyms (P) are generated by sequentially performing, in a permutation of the mappers (M j ), a number k of mappings utilizing mapping cryptographic keys (h ij ) of the mappers (M j ) belonging to the particular data source (DS i ) on each encrypted entity identifier (C i0 ) encrypted by the data source (DS i ). The invention is further a computer system realizing the invention, as well as a computer program and a computer-readable medium.

TECHNICAL FIELD

The invention relates to a cryptographic method and computer system forpseudonym mapping, a computer program and a computer-readable medium,preferably for implementing a system for data sharing wherein the datacan be analysed in an anonymous manner. The invention provides a securepseudonymisation solution that complies with the regulations of GDPR.

BACKGROUND ART

WO 2017/141065 A1 entitled “Data management method and registrationmethod for an anonymous data sharing system, as well as data manager andanonymous data sharing system” discloses a solution for analysing dataresiding with multiple mutually independent entities hereinafter, datasources in a way that the data are loaded in a single unified databasein which the identifiers of the entities (for example, persons,companies) are stored applying pseudonyms adapted for protectinganonymity, ensuring that the original data cannot be restored from thepseudonyms. The present invention complements the solution disclosed inWO 2017/141065 A1 with a pseudonym mapping method that is secure from anumber-theoretical aspect. However, security risks for the process ofassigning the pseudonyms to the original identifiers are posed not onlyby the vulnerability to number-theoretical attacks of the pseudonymmapping algorithm. In WO 2017/141065 A1 a detailed description ofmeasures that have to be taken, in addition to providing the pseudonymmapping method, in order to secure the anonymity of the databasecontaining the pseudonyms is provided. These are, among others, theprohibition of assigning attributes to data, the analysis of k-anonymityand I-diversity, or the prevention of node identification based on themorphological properties of the graph reflecting the interrelations ofentities. All the methods described in the referenced document can alsobe applied in the present invention, including the case wherein ensuringthe anonymity of the data sources is also a requirement. This isespecially important in the case when the data sources report datarelated to themselves.

Nowadays, almost all real-world events leave traces in the form of datastored in the digital space. The analysis of these data allows formaking valuable inferences. The data are stored at a plurality ofentities that are usually not in a dependent relationship with oneanother. The data are often characteristic of entities (for example,persons, companies, institutions, properties, apparatuses, financialassets, etc.) or describe the behaviour thereof. In the databases, theentities are referred to applying widely known entity identifiers (forexample, social security number, tax number, land registry number). Theanalysable data that are characteristic of the entities according to theentity identifiers are called attributes.

An analysis that better approximates reality can be carried outconcerning the behaviour and the interrelations of the entities in casethe widest possible scope of data can be utilized for the analysis. Thebest way to do that would be to analyse all the available data applyinga single database. However, the databases often contain confidentialinformation, or for example in the case of natural persons legallyprotected information. This sets limits for data managers in sharing thedata managed by them for aggregated analytical purposes. Because ofthat, the data managers, i.e. the data sources have to pass on the datasuch that the entities performing the pseudonymisation mapping and theanalysis applying the common database are not able to access theoriginal entity identifier. This is feasible because, in most cases, theaim of the analysis is not understanding the properties, behaviour orcontact network of a particular person or thing, but recognisingpatterns of behaviour that can be expected from (anonymous) individualsin a larger population, analysing the structure of contact networks, andmaking inferences related to the future course of events.

The requirements set for the mapping between the unencrypted, openentity identifier and the anonymous identifier (hereinafter: pseudonym)stored in the common database, are defined by the method disclosed in WO2017/141065 A1. This mapping can be practically implemented only byutilizing a special information-technology device, namely, acryptoprocessor (a dedicated computer unit that performs cryptographicoperations under physically protection). In open multi-user systems thisposes problems for the applicability of the system. In contrast to themappings carried out in a single step, the known technical solutionusually provides protection against “brute force”-type attacks (whereinby possessing information on the operation of the encryption system, theapplied key is determined by trying each possible key), but maliciouscooperation between a data source and the entity performing the mappingcan be prevented only by applying a complementary method, for example byencrypting the mapped values by an additional entity.

The pseudonym can be applied for the purposes of the above describedanalysis if a given open entity identifier is entered into the commondatabase under the same pseudonym, irrespective of which data sourcesent it, i.e. the mapping between the unencrypted identifiers and thepseudonyms has to be a one-to-one mapping, where the inverse of themapping cannot be computed, i.e. the unencrypted entity identifiercannot be generated from the pseudonym, by any entity. If the mapping iscarried out by the data sources, then they also have to apply the samemapping. If an algorithmically non-reversible mapping is required, thena cryptographic hash function is usually applied, with the unencrypteddata being the input of the function, and the output value being in thecase the pseudonym. What poses a problem is that the multiplicity of theentity identifiers is usually low, on the order of between a hundredmillion and a few tens of billions. For such a manifold, a rainbow table(a pre-computed table for inverting cryptographic hash functions) can begenerated in a very short time. Therefore, in the course of computingthe hash value, the input data are complemented with “salt” (randomlychosen data applied as additional input data of hash functions). In sucha case, all entities have to apply the same “salt” so that theone-to-one relationship can be maintained. However, data that are usedby all of the data sources can hardly be regarded a secret, or, toperform the calculations it is not even necessary to know the value ifthe attacker can access the system of any of the data sources (forexample, the attacker can be one of the data sources that is notrestricted in any way in performing an arbitrary number of mappings).

Another possibility is to entrust the generation of the relation betweenthe unencrypted data or the data encrypted by the data sources applyingthe same encryption and the pseudonym to a trusted cooperator. Thetrusted cooperator is able to compile the rainbow table trivially in thefirst case, and in the second case, by gaining access to only a singledata source's system. Therefore, the solution according to WO2017/141065 A1 (US 2019/213356 A1) came to the conclusion that the datasources have to apply an encryption method based on a unique, forexample, an own, cryptographic key. In such a case, the same entityidentifier is sent by the data sources as different ciphers (encrypteddata), while pseudonym mapping has to be performed such that thedifferent ciphers have to be assigned to the same pseudonym if theparticular ciphers were computed from the same unencrypted identifier.In the solution implemented according to the document, RSA keys areapplied, wherein the decryption key is stored in a Trusted PlatformModule (TPM, see for example ISO/IEC 11889), the decryption process andthe mapping of the unencrypted data into the pseudonym is carried oututilizing a secure cryptoprocessor. This architecture is difficult toimplement and requires significant initial investment, while itsoperation is also cumbersome because the required hardwareinfrastructure scales linearly with the number of data sources.

EP 3 188 070 A1 discloses a double encryption method, while proxycryptography is disclosed in Patil Shravani Mahesh et al, “RSA-BasedCollusion Resistant Quorum Controlled Proxy Re-encryption Scheme forDistributed Secure Communication”, 11 Dec. 2018 (2018-12-11), Advancesin Databases and Information Systems; [Lecture Notes in ComputerScience; Lect. Notes Computer], Springer International Publishing, Cham,page(s) 349-363.

DESCRIPTION OF THE INVENTION

The object of the invention is to eliminate, or to reduce the impact of,the drawbacks of prior art technical solutions, especially the prior artsolution presented above.

The primary object of the invention is to provide a cryptographicpseudonym mapping solution that does not require for performingdecryption and for mapping the unencrypted data to the pseudonym the useof secure hardware, for example a cryptoprocessor.

The objects of the invention have been fulfilled by providing thecryptographic pseudonym mapping method according to claim 1, thecomputer system according to claim 12, the computer program according toclaim 17, and the computer-readable medium according to claim 18.Preferred embodiments of the invention are defined in the dependentclaims.

The cryptographic pseudonym mapping method according to the invention isadapted for generating a pseudonymised database from entity data,wherein the data are identified at the data sources utilizing the entityidentifiers of the respective entities, and wherein the data areidentified in the pseudonymised database utilizing pseudonyms assignedto the respective entity identifiers applying a one-to-one mapping.

The present invention is a solution utilizing characteristics of modularexponentiation performed on residue classes, and the properties ofoperations based on specially selected discrete points of ellipticcurves, and preferably also blockchain technology or a similartechnology providing decentralized authenticity that implements therequired abstract mapping, while not containing the above mentionedlimitations related to the prior art.

In contrast to the prior art, the invention does not require any specialhardware for storing the cryptographic keys or for performingcalculations, but instead solves the problem by purely cryptographicmeans. This requires first of all that the entity identifiers have to beassigned to elements of the algebraic (mathematical) structure (see forexample in Wikipedia) on which the cryptographic calculations areperformed. Information technology devices apply a binary representationof data, so data can be interpreted as positive integers that can beutilized for performing calculations. In the following, it is assumed ofthe domain of the mappings that it is capable of providing a uniquerepresentation of the entity identifiers and the computed ciphers. Forexample, if the computations are performed on a cyclic group (see forexample in Wikipedia) of residue classes, then the modulus is chosen tobe large enough that a sufficient number of residue classes areavailable. Due to the key sizes applied in practical implementations,this does not pose a problem. In the case of modular exponentiationperformed on residue classes, for example, the exponent can berepresented applying much more bits compared to practically occurringentity identifiers. In such cases, the so-called “padding” of the valuescan be considered, such that the exponentiation performed with a lowbase cannot be inverted by ordinary root computation. This occurs incase modular arithmetic is not required during the process of computingthe result. Due to the requirement of applying a one-to-one mapping onlydeterministic padding methods can be applied.

Therefore, a plurality of data sources is considered, each data sourcecomprising a database containing entity identifiers and attributes. Thedata have to be collected in a common database such that the entityidentifiers are included therein applying pseudonyms according to thefollowing:

-   (1) A given entity identifier has to be mapped to the same    pseudonym, irrespective of the data source it was received from.-   (2) The same pseudonym must never be assigned to two different    entity identifiers.-   (3) The relationship between an unencrypted identifier and its    pseudonym must not be obtainable by any participant of the system    utilizing only the information known by it, even if a data source    cooperates with a participant taking part in the mapping with the    intention of breaking the encryption.-   (4) If an attack is started by one or more but not all parties, even    in cooperation, in order to disclose the relationship between an    unencrypted entity identifier and its pseudonym (for example in    order to compile a rainbow table), then it has to be detected by the    other parties.

Conditions (1) and (2) together imply that the mapping has to be aone-to-one mapping. Cryptographic mappings meet this requirement,provided we remain inside the domain (in cryptography, the messagedomain) thereof. Condition (3) excludes all mappings that can beperformed by only one or two participants, without cooperating withothers. The same follows from condition (4). It must not be possible forthe data source to track the steps of the mapping, because otherwise itcan trivially obtain the pseudonym as the result of the last computationstep. The results of their computations are of course accessible to theentities performing the mapping, so they must not access the unencryptedentity identifiers. This can be provided if the entity identifiers aresent by the data sources to the mapper entities applying their ownunique encryption, i.e. utilizing their own cryptographic key, but thedata sources either cannot “see” the pseudonym mapping computations orthey cannot relate it to the data provided by themselves.

According to the technical solution described in WO 2017/141065 A1,pseudonym mapping has to be performed applying the cipher by breakingdown the mapping into steps wherein a given step can be performed onlyby a single participating entity adapted to perform the mapping:

P=g _(b)(f _(key) _(i) ⁻¹(C _(i)))=f _(b)(f _(key) _(i) ⁻¹(f _(key) _(i)(D)))

where D is the entity identifier, P is the pseudonym, i is the numericidentifier of the data source, and C_(i) is the cipher computed applyingits own key. The different mappings in an encryption system usuallyexecute the same algorithm applying different keys. Therefore, themapping g performed applying the key b can be replaced by f_(b).Applying a single mapper, for example a secure cryptoprocessor, themapper is adapted for decrypting the cipher, following by mapping theunencrypted data to the pseudonym P applying the pseudonym mapping keyb. For example, applying the RSA method (see for example in U.S. Pat.No. 4,405,829 A) the cryptographic key of the i-th data source is(e_(i), N), where e is the encryption exponent and N is the modulus. Thecipher is obtained by the calculation

C _(i) ≡D ^(ei) mod N

and is sent to the entity performing pseudonym mapping that generatesunencrypted data utilizing the decryption key (di, N), where di is theexponent, performing the calculation

D≡C _(i) ^(di) mod N

According to U.S. Pat. No. 4,405,829 A this calculation is performed forexample applying a secure cryptoprocessor such that the mapper cannotaccess the unencrypted data but can use the results for computing thepseudonym. The pseudonym is obtained from the unencrypted data utilizingthe cryptographic key (b, N) of the mapping g f_(b) (here, unlikeelsewhere in this description, the E sign denotes identity rather thancongruence):

P≡D ^(b) mod N

It is important that the values di and b cannot be read out from thedevice performing the computation; such a device is for example theTrusted Platform Module chip. Because both g and f represent modularexponentiation modulo N, hereinafter only f is used. Using the notationof the above example, the entire mapping is

P≡E((D ^(ei) mod N)^(di) mod N)^(b) mod N

where the innermost cipher computation utilizing the exponent e_(i) isperformed by the data source, followed by the mapper performing thecomputation applying the exponent b.

An object is to present a computation method for performing the lattertwo mappings in the course of which the entity performing thecomputation

-   -   i. is not able to access the entity identifier D, i.e. the        unencrypted data, and    -   ii. is not able to access the exponent b that is applied for        generating a pseudonym from the unencrypted data.

It follows from condition (i.) that the entity performing thecomputation must also not be able to access di because otherwise itcould decrypt the cipher. Condition (ii.) is required in order toprevent a successful trial-and-error or rainbow-table based attack bythe mapper. In the exemplary mappings, data are represented applyingresidue classes defined by a positive integer modulus (N).

In the solution according to the invention, decryption applying aninverse key and mapping implemented applying multiple mappers can beperformed in an arbitrary number of steps such that unencrypted data (anentity identifier) is not generated in the course of the computations,no entity is able to obtain the decryption key key_(i) ⁻¹, and also noentity is able to obtain the pseudonym mapping key b, i.e. no entity isable to generate a pseudonym from unencrypted data in secret, i.e. tocompile a rainbow table. The solution also ensures that in case of aparticular mapping the execution order of the mappings performed by themappers cannot be established in advance, thereby it is made moredifficult for the participating entities to successfully cooperate withthe aim of cracking the system. To provide for that, informationtechnology methods based on known number theoretical bases are applied,including means disclosed in relation to the protocols applied byblockchain technology.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the invention are described hereinafter by wayof example with reference to the following drawings, where

FIG. 1 is a schematic diagram of a solution according to the inventionimplemented applying a key manager,

FIG. 2 is a schematic diagram of a solution according to the inventionimplemented without a key manager, and

FIG. 3 is a table schematically illustrating an exemplary pseudonymmapping process.

MODES FOR CARRYING OUT THE INVENTION

According to the invention it has been recognised that thecharacteristics of algebraic structures constituting multiplicative oradditive cyclic groups can be preferably utilized to fulfil the objectsof the invention. Two types of solutions based on such algebraicstructures are described below in more detail, but, according to theinvention, other such algebraic structures that provide the arithmeticrequired for the operation of the invention can also be applied. Of theexemplary algebraic structures, a solution involving residue classesmodulo N (where N is a positive integer) is first described in detail,followed by describing, in relation to the former, a solution involvingpoints of elliptic curves defined over the number field of residueclasses modulo ρ (where ρ is a prime).

The entity identifiers and corresponding data are stored in databases atmutually independent data sources, and, after pseudonymisation accordingto the invention, the data, together with the pseudonyms generated fromthe entity identifiers, are stored, assigned to each other, in a centralpseudonymised database. Complying with the conditions of the object setfor the invention, for a given entity identifier, the relationshipbetween unencrypted data and the pseudonym cannot be affected by theorigin of the data (i.e. what data source it came from). However, theprocess of mappings, i.e. the operations performed at the particularstages, are unique for each data source, that is, differentcryptographic keys (for example, modular exponents) have to be used forperforming the same mapping. Apparently, the range of a mappingpreceding another one in the sequence cannot be greater than the domainof the latter. For residue classes this implies that the value of themodulus cannot be decreased during the process. Since the order of themappings depends on the mapped data and on the applied key, i.e. isdifferent each time, this condition can only be fulfilled by applying aconstant modulus. Therefore, the implementation of a data gatheringsystem necessarily begins with selecting an appropriate modulus. This iscarried out in practice by the provider of the data gathering service,or the data gathering community first deciding upon the bit length ofthe applicable keys. Then, two such prime numbers are selected of whichthe product (applied as the modulus) can be represented using the givennumber of bits. The entity or entities generating the keys (for examplethe key manager or the data sources) have to know the modulus N and alsoits value φ(N) given by the Euler function, or in other words, itstotient value. The value N of the modulus has to be known by allparticipants performing mappings. If the representation size of theentity identifiers to be mapped is significantly smaller than the keysize, some kind of padding method is preferably applied. This method hasto be deterministic in the sense that every data source has to receivethe same value such that the pseudonym is also deterministic,irrespective of the data source. The basic data of the mapping aretherefore N and φ(N).

According to the invention, by random selection it is meant that theimplementation of the method is not dependent on which particularelements of the given set are chosen. Accordingly, random selection ismeant to include also quasi-random or pseudo-random selection, as wellas all such selection methods (even according to rules unknown to anobserver) wherein the selection appears to be random to the outsideobserver. If the set constitutes an algebraic structure, then, if it hasa null element and/or a unit element, then it/they are not regarded asrandomly selected.

Also, in the case of residue classes, the selection of non-relativelyprime values is avoided. However, for cryptography considerations it isworth selecting values for which the bit length of their representationfills up all the available space.

As can be seen in FIGS. 1 and 2, entity identifiers D and attributes(the latter are not shown in the diagram) describing the entities ortheir behaviour are stored by data sources DS_(i) in their respectiveown databases. The relationship between a particular entity and theother entities can be regarded as a characteristics of the given entity.Thus, in such a relation the entity identifiers of the other entitiesare regarded as attributes (for example, B is a client of A, in whichcase B is an attribute and an entity identifier at the same time).Because the aim of the technical solution according to the invention issupporting anonymity, such data can also be regarded as an entityidentifier.

The attributes related to the entity identifiers D are preferably passedon by the data sources DS_(i) as unencrypted data, while the entityidentifiers D are encrypted by the data sources DS_(i) utilizing theirown cryptographic keys. The resulting cipher is sent to the entitiesadapted to perform the mapping to the pseudonym P, i.e. to the mappersM_(j). At the same time, an assignment between the unencrypted data andthe cipher, i.e. the encrypted entity identifier is maintained, becausethe database required for data analysis can only be loaded with usefulinformation in such a manner.

For the security of the pseudonym mapping it is crucial that no entityis able to carry out the operation by itself, i.e. no one is able togenerate a pseudonym from unencrypted data. This is possible only if noentity possesses the value of the below described exponent b that,together with the modulus N known by everyone, is sufficient for mappingan unencrypted value to a pseudonym: P E D^(b) mod N.

The above condition can be fulfilled only in case b is not computed.Because we are dealing with an exponent of modular exponentiation overresidue classes, the operation can also be performed utilizing themultiplicative factors of b. If b=b₁·b₂· . . . . ·b_(k), then

p≡D ^(b1,b2, . . . ,bk mod φ(N))mod N≡D ^(b) mod N

If the system comprises a number k of mappers, then each may generate afactor for b that is relatively prime to φ(N). If φ(N) is not known tothem, then they choose a prime number. Thereby, they can carry out theabove mapping only collectively. In order to do that, it is notnecessary to share the factors ID; among them. What is required is thateach mapper performs a modular exponentiation exactly once (in anarbitrary order), utilizing its own factor:

P ≡ (((D_(i)^(b_(j 1)))^(b_(j 2)))^(…))^(b_(jk))  mod  N ≡ D^(b_(j 1) ⋅ b_(j 2) ⋅ … ⋅ b_(jk))  mod  N

where the indices j_(p)∈{1 . . . k} stand for an arbitrary(arbitrary-order) permutation of the factors of b.

The cryptographic keys ei of the data sources DS_(i) and the decryptionfactors are generated as follows. The pseudonym has to be computed bythe mappers M_(j) not from unencrypted data, but from the ciphercomputed by the data sources DS_(i). If the product b of the exponentfactors was available, then the cipher

C _(i) ≡D ^(ei) mod N

received from the i-th data source would be used above for the followingcalculation:

P≡((C _(i) ^(di) mod N)^(b) mod N

Because according to a basic idea of the invention more than one (i.e. anumber k of) mappers are applied, the same method is applied as for theexponent b: let us generate d_(ij) as a modular product having a numberof factors equaling the number k of the mappers M_(j) (since modularexponents are applied, here the modulus is φ(N)). In the case of thei-th data source, key generation begins with generating the factors ofthe exponent d_(ij): d_(ij)=d_(ij1)·d_(ij2)· . . . ·d_(ij2) (the firstindex identifies the cryptographic key of the data source, and thesecond identifies the mapper) that are randomly selected by the datasource and that each are relatively primes to φ(N). Then the extendedEuclidean algorithm is applied for computing e_(i), for which theformula e_(i)d_(ij)≡1 mod φ(N) will hold true, i.e. e_(i) will be theinverse cryptographic key of the product. The number of the elementsd_(ij) or factors, equals the number k of mappers, so they have to bepassed on applying any known method to the mappers M_(j) in encryptedform. Utilizing an element b_(j) randomly selected from the algebraicstructure and kept secret, each mapper computes the pseudonym mappingexponent h_(ij)≡b_(j)·d_(ij) mod φ(N) corresponding to the i-thcryptographic key, i.e. the mapping cryptographic key h_(ij) of themapper M_(j) corresponding to the data source DS_(i). Since φ(N) isunknown to the mappers M_(j), they cannot perform normalizationaccording to the modulus φ(N). As a result of this, the (maximum) sizeof the exponent will be twice the key size (because it is obtained asthe product of two numbers that can each be represented utilizing thegiven key size), which does not pose any practical problems, because itrepresents the same residue class as would have been the result ofnormalization.

As an initial step of mapping the pseudonym P, the cipher C_(i)≡D^(ei)mod N is passed on by the i-th data source to the mapper M_(j). Themapping process starts at this step. Before the first computationalstep, the initial value is the cipher of the data source that is denotedin the index C_(i0)=C_(i). which corresponds to the encrypted entityidentifier C_(i0) of the entity identifier D to be mapped. When,following the k-th step, all the mappers have already executed themodular exponentiation operation C_(i,s+1)=C_(i,s) ^(hij) mod N applyingits own exponent, the pseudonym C_(ik)=P is obtained, because

(((C_(i)^(h_(i 1)))^(h_(i 2)))^(…))^(h_(ik))  mod  N ≡ C_(i)^(Π_(j = 1)^(k)h_(ij)  mod  φ(N))  mod  N ≡ C_(i)^(h)  mod  N ≡ P

where the order of the exponents h_(ij) is arbitrary. Of course, in aconcrete implementation the sequence order has to be determined somehow;it can be random, quasi-random, or deterministic. The pseudonym P istherefore generated by sequentially performing, on each encrypted entityidentifier C_(i0) encrypted by the data sources DS_(i), a number k ofmappings in a permutation of the mappers M_(j) utilizing the mappingcryptographic keys h_(ij) of the mappers M_(j) corresponding to the datasources DS_(i). The solution according to the above formula is alsopreferable because the representation size does not increase in thecourse of the calculations, since the result of the exponentiation isnormalised by way of the modular operation.

Therefore, such a key system was provided above that also fulfilsrequirements (3) and (4) above set for pseudonym generation, because forcarrying out a mapping the cooperation of a data source and all of themappers is required. For the same reason it is also impossible to mountan undetected rainbow-table attack, because it is sufficient if one ofthe mappers detects the initiation of hundreds of millions or billionsof mapping processes. In such a case, the mappers not engaged in thecracking operation deny to perform mappings of the messages encryptedwith the key having the given index.

To provide a concrete implementation of the above described idea, theroles of and the mode of cooperation between the different entities,i.e. the data sources DS_(i) the mappers M_(j) and the optionallyincluded key manager KM have to be established.

With the solution based on residue classes, that is, in the case whereinthe entity identifiers D and the pseudonyms P are represented by residueclasses modulo N, φ(N) has to be kept secret, because an entitypossessing it can compute the inverse exponents, i.e. the inverse keys.In this solution, however, the inverse of the particular exponentfactors does not yield the inverse of the encryption mapping, so it doesnot pose a danger if it is computed. The application of a key manager KMis therefore optional for implementing the system.

An exemplary solution including a key manager KM that can be seen inFIG. 1 comprises the following steps:

-   -   (1) the constants of the mapping are generated by the key        manager KM: N=p·q, φ(N)=(p−1).(q−1) (both factors are randomly        selected prime numbers that can preferably be represented in        half the bit length of the chosen key size), of which N is made        public to the other entities and φ(N) is kept secret;    -   (2) the secret elements b_(j), i.e. in this embodiment, the        exponents, are generated by the mappers M_(j) because the        elements have to be relatively prime to φ(N) that is unknown to        the mappers M_(j), randomly chosen primes are selected, for        which the condition is trivially fulfilled;    -   (3) upon the request of a data source DS_(i), the key manager        generates a cryptographic key e_(i) with an identifier i for the        applicant, and also generates the corresponding elements        d_(ij1), . . . , d_(ijk), i.e. the exponent factors of the        mapping for the number k of mappers M_(j) according to the        conditions specified in the above description; the keys are        identified by a respective index to keep them assigned to the        data sources; the data have to be kept secret so they have to be        passed on in an encrypted manner;    -   (4) the values h_(ij)≡d_(ij)·b_(j) are computed by the mappers        M_(j) and are kept secret; because φ(N)— is not known to them,        this exponent can be represented in double the bit length of the        key size because this way the residue class cannot be        represented applying the least positive integer;    -   (5) the entity identifier D, i.e. the unencrypted data is mapped        by the data source into the cipher C_(i0)≡D^(ei) mod N and is        sent to the mappers M_(j);    -   (6) in an agreed-upon order, the value C_(i,s+1)=C_(is) ^(hij)        mod N is computed, from (s=0) to (s=k−1), by the mappers M_(j)        such that the result C_(i,k) of the last mapping will become the        pseudonym P.

In the above described embodiment, the data are encrypted by the datasources DS_(i) applying respective own secret cryptographic keys e_(i)identified by the index i, where a data source DS_(i) can have anarbitrary number of keys that are mapped into the pseudonym P by thecooperation of a plurality (a number k) of mappers M_(j) identified bythe index j (j∈ {1 . . . 1}).

It is particularly preferable to choose prime numbers as the values pand q, because in that case the number of relative primes is known (itis (p−1).(q−1)).

The exemplary implementation without a key manager that can be seen inFIG. 2 comprises the following steps (here the tasks of the key managerare preferably performed by the data source DS_(i)).

-   (1) the constants of the pseudonym mapping are generated either by a    provider, or, consensually, by the participating entities: N=p·q,    φ(N)=(p−1)·(q−1) (both factors are randomly selected prime numbers    that can preferably be represented in half the bit length of the    chosen key size), it is not necessary to keep these data secret;-   (2) the secret elements b_(j), i.e. the exponents, are selected    randomly by the mappers M_(j), because the exponents are relatively    prime to φ(N) that is known to the mappers, in this case it is not    necessary to select prime numbers;-   (3) each data source DS_(i) is adapted for generating an arbitrary    number of keys; in this case it is expedient to choose the hash code    of the elements or factors of the cipher exponent e_(i) and the    elements d_(ij1), . . . , d_(ijk) of the decryption exponent as the    key identifier (after generating the factors) so that no central    coordination for ensuring uniqueness is necessary; the key is    generated as follows: the elements d_(ij1), . . . , d_(ijk) or    factors which are relatively prime to φ(N), and of which the modular    product d_(ij) calculated for the modulus φ(N), as a value derived    from the aggregate of the elements, will be the exponent d_(ij) of    the decryption key are selected randomly by the data source; this is    followed by finding the multiplicative inverse of d_(ij) for the    same modulus, whereby e_(i) is obtained. The elements or factors    d_(ij1), . . . , d_(ijk) are sent in encrypted form (for example,    are uploaded to a blockchain) to the mapper M_(j) corresponding to    the particular index; if a blockchain is applied, the cryptographic    key can be the public key of the mapper's wallet;-   (4) the value h_(ij)≡d_(ij)·b_(j) mod φ(N) is computed by the    mappers M_(j) and is kept secret;-   (5) the entity identifier D, i.e. the unencrypted data is mapped by    the data source DS_(i) into the cipher C_(i0)≡D^(ei) mod N and is    shared with the mappers M_(j), preferably by writing it into a    database that operates according to a protocol verified by third    parties and provides decentralized authenticity (practically, into a    blockchain);-   (6) A) in an agreed-upon order, the mappers compute the value    C_(i,s+1) ^((j))=C_(i,s) ^(hij) mod N, where i denotes the    cryptographic key of the data source, s the step number of the    mapping, and j the identifier of the mapper; the result C_(i k) of    the last mapping will become the pseudonym P; if the mappers are the    nodes of a “permissioned” blockchain, then the result of the mapper    j that is entitled to close the following block according to the    blockchain consensus protocol is written into the next block    (C_(i,s+1)=C_(i,s+1) ^((j))); no further operation is performed by    this node in the further course of the mapping (the exponent factor    h_(ij) can be use utilized exactly once by each entity in the    mapping sequence); thereby, one less of the values C_(i,s+1) ^((j))    are generated in each step, and only one in the last step;-   B) in the case of a public blockchain, the results C_(i,s+1) ^((j))    are uploaded by the mappers into the blockchain during each    transaction, followed by continuing the computation utilizing a    value C_(i,s+1) according to a predetermined rule; such a rule can    for example be that the modulo-N sum of the values C_(i,s+1) ^((j))    are computed, and then the value C_(i,s+1) applied for the further    computation will be the value that is arithmetically closest    thereto; in such a case, in order that each exponent factor h_(ij)    is utilized only once for computing the result, the mapper whose    value has already been selected does not take part in further    computations; the result C_(i,k) of the last mapping will become the    pseudonym P; if a public blockchain is applied, it has to be    guaranteed that the data source that is in possession of the    unencrypted data is not able to track the chain of computations so    that it cannot connect the data with the computed pseudonym.

During computing the pseudonym P, the values C_(i,s+1) ^((j))=C_(i,s)^(hij) mod N are computed by the mappers M_(j), from which the valueC_(i,s+1), to be utilized as the input value of the subsequentcomputation step, is chosen by a program (for example, a blockchainsmart contract) operating according to a verified protocol utilizing adeterministic method. Each mapping exponent has to be used only once ina mapping, with only those mappers M_(j) performing a calculation in thefollowing steps of which the result has not yet been selected (as of thecurrent state of the process) as the input of the subsequent mapping.With a number k of mappers, the process is concluded by computing thepseudonym P in the k-th step (P=C_(ik)).

FIG. 3 illustrates how the mapping is performed according to apermutation with a given order; the steps and other information shown inthe diagram are to be interpreted as per the description above.

In FIGS. 1 and 2, therefore, two conceptually different solutions areillustrated.

-   -   (1) A key manager KM and a plurality of mappers M_(j) are        applied, wherein the key applied for mapping the unencrypted        data into the pseudonym P is generated utilizing exponent        factors, the key manager KM generating the required pair of        cryptographic keys such that the exponent of the decryption key        is generated factor-by-factor.    -   (2) The cryptographic keys are generated by the data sources        DS_(i) such that the exponent of the decryption key is generated        factor-by-factor. The key applied for mapping the unencrypted        data into the pseudonym P is generated by the plurality of        mappers M_(j) applying exponent factors, there is no key        manager.

As it was mentioned in the introduction, pseudonym mapping can also beperformed applying points of elliptic curves (see for example theWikipedia article “Elliptic curve”) defined over the number field ofresidue classes modulo ρ (where ρ is a prime). In this context, let thealgebraic structure be the set of points satisfying the equationy²=x³+Ax+B mod p, where x, y, A and B are the residue classes of theprime number p. First, the unencrypted entity identifier m has to beassigned to a point of the curve. Let us choose a point G of the curvehaving an order q that is sufficiently great that the points of themessage space can be assigned to the points generated by G applying aone-to-one mapping. (For all points of the curve there is a number qbeing the number of additions to itself of the point required forreaching the point O at infinity. The smallest of such numbers q givesthe order of the point.) To achieve that, for example the followingmethod can be applied (Aritro Sengupta, Utpal Kumar Ray: Message mappingand reverse mapping in elliptic curve cryptosystem (2016)). At the loworder digits the binary representation of D is complemented by 8 bits.In the above defined formula of the curve, x is substituted with thevalue thus obtained. If no solution exists for y, then the value of x isincreased by one. If a solution does exist, then a point M of the finitealgebraic structure defined by the curve has been obtained. Thedescription related to the specification of the objects above is appliedhere such that this point is projected by the i-th data source DS_(i) toanother point C_(i) of the curves applying its own cryptographic key,followed by it being projected by the mappers to the point P utilized asa pseudonym such that the different ciphers C_(i) are assigned to thesame point P if and only if the point M was identical.

Because the solution based on algebraic structures forming an additivecyclic group operates in an analogous manner to the solution based on amultiplicative cyclic group, it is not shown separately. The referencesshown in the figures can be substituted, where needed, with thecorresponding operations and references included in the followingdescription. The values x, y, A, B and p adapted to define the algebraicstructure are defined by the entity providing the pseudonym mappingservice that also selects the point G with a known order greater thanthe multiplicity of the message space. The entity then shares the datawith the data sources and the mappers. A respective secret key b_(j),j=1..k is chosen randomly by each of the number k of mappers from theresidue classes of mod q, selecting values different from 1 and 0. Thesum of these values is denoted by b=Σ_(j=1) ^(k)b_(j).

For data provision, as many elements a_(ij) (i.e., numbers) as thenumber k of mappers are randomly selected from the residue classes of qby the i-th data source, the sum a; of the elements will be the owncryptographic key e_(i)=a_(i)=Σ_(j=1) ^(k)a_(ij) thereof. This key ispassed on in an encrypted form to the mapper with the appropriate index,and the latter then computes the mapping key corresponding to the datasource applying the formula h_(ij)=a_(ij)+b_(j). In the case of ablockchain system, the public portion of the signing key of the mappercan be utilized for the encryption.

After that, the encryption operation is performed by the data sourceDS_(i) by adding two points: C_(i0), =M⊕a_(i)G, where the operator eldenotes the addition of two points of the curve, and scalarmultiplication denotes repeated addition. The above process carried outon residue classes is modified only in that the below describedoperation is performed on the points of the curve. In the s-th step thefollowing operation is performed by the mapper with the index j on thedata originating from the i-th data source: C_(i,s+1)^((j))=C_(is)⊕(−h_(ij)G), where the unary operator “_” denotes thereflection of a curve point over the x axis. The operation ⊕ utilizingsuch values are hereinafter denoted with the operator ⊖. Thus,performing a complete sequence of mappings, the pseudonym is obtained asa result of the following operations:

$\begin{matrix}{P = {M \oplus {{{{a_{i}G} \ominus {h_{i1}G}} \ominus \ldots} \ominus {h_{ik}G}}}} \\{= {M \oplus {{a_{i}G} \ominus {\left( {a_{i1} + b_{1} + \ldots + a_{ik} + b_{k}} \right)G}}}} \\{= {M \oplus {\left( {a_{i} - \left( {{\sum\limits_{j = 1}^{k}a_{ij}} + {\sum\limits_{j = 1}^{k}b_{k}}} \right)} \right)G}}} \\{= {M \oplus {{\left( {a_{i} - a_{i}} \right)G} \ominus {bG}}}} \\{= {M \ominus {bG}}}\end{matrix}$

Thus, the same entity identifier D is sent by each data source as adifferent cipher, but finally it is assigned to the same pseudonym P.Optionally, the x coordinate of the point P can also be applied as thepseudonym.

For computing the pseudonym P, the values C_(i,s+1)^((j))=C_(i,s)⊖h_(ij)G are therefore computed by the mappers M_(j) from(s=0) to (s=k−1), where A⊖B=A⊕(−B), from which the value C_(i,s+1), tobe utilized as the input value of the subsequent computation step, isselected by a program (for example, a blockchain smart contract)operating according to a verified protocol utilizing a deterministicmethod. Each mapping key has to be used only once in a mapping. In thenext step, only those mappers M_(j) perform a calculation of which theresult has not yet been selected (as of the current state of theprocess) as the input of the subsequent mapping. With a number k ofmappers, the process ends by computing the pseudonym P in the k-th step(P=C_(ik)).

If a key manager KM is to be utilized, then this entity is applied forgenerating, i.e. for randomly selecting, the addends of a_(i).

Therefore, in order to ensure that possessing any component of thesystem is not sufficient to allow for deciphering the relationshipbetween the pseudonym P and the entity identifier D, the following dataconversion is performed by the pseudonym mapping system according to theinvention:

-   -   producing from data that are available at data sources DS_(i)        and that are suitable for identifying persons, things or other        entities by a characteristic name, i.e. from the entity        identifier D,    -   such pseudonymised data, in which the entity identifiers D are        replaced by a pseudonym P assigned thereto in a one-to-one        manner independent of the cryptographic key e_(i) utilized by        the data source DS_(i),

such that

-   -   as many elements d_(ij), a_(ij) are chosen randomly, by the data        sources DS; applying an encryption means/module, from an        algebraic structure forming a multiplicative or additive cyclic        group utilized by the cryptographic algorithm, as the number of        the mappers M_(j) (preferably, here the following functionality        is implemented by the encryption means/module: it generates a        random number which can be mapped into the key space applying a        suitable mapping in order to select from among the elements of        the key space with a near-uniform probability, i.e. randomly),        of which elements the inverse cryptographic key is computed        depending on the structure, utilizing their product or their sum        and is utilized as their own, unique encryption key e; for        encrypting their respective entity identifiers D, and    -   these ciphers (or encrypted data) are mapped into respective        pseudonyms by a plurality of encryption means called mappers        M_(j), the mapping being performed by the mappers M_(j) in a        centralized system or in a decentralized (peer-to-peer, e.g.        blockchain) network applying their respective own unique mapping        cryptographic keys h_(ij), executing their own operation in an        arbitrary sequence order,

such that

-   -   for computing each unique mapping cryptographic key h_(ij), a        single one of the elements d_(ij) passed on by the data source        DS_(i), and a data element that is randomly selected by the        mapper itself from the applied algebraic structure and is kept        secret, are applied.

The computer system for cryptographic pseudonymisation according to theinvention comprises

-   -   data sources DS_(i) containing data related to entities, the        data being identified at the data sources DS_(i) by the entity        identifiers D of the entities,    -   a pseudonymised database DB wherein the data are identified by        respective pseudonyms P assigned, in a one-to-one manner, to        each of the entity identifiers D,    -   a number k (i.e. more than one) of mappers M_(j),    -   optionally, a key manager KM, and    -   modules implementing the above described functions and/or        entities, which modules can be hardware, software, or combined        hardware-software modules.

The key manager KM is preferably an apparatus comprising a processoradapted for executing a program and memory adapted for providing datawriting, storage, and read-out functions. The program run on theapparatus is adapted to generate the data required for executing themappings, for example the modular exponent adapted for generating apseudonym from unencrypted data and the totient value of the modulus.The apparatus is adapted for storing these values such that they cannotbe accessed by anybody else, but it can still be capable of performingcomputations utilizing them. In addition to that, it is also capable ofcomputing modular exponent key pairs applying the above describedprocess, for example the extended Euclidean algorithm, and of passing onthe encrypted exponent to the data source over a secure data channel andcomputing the exponent applied for pseudonym mapping, which latter itcan also pass on to the entity performing the mapping over a secure datachannel. All these requirements are fulfilled for example by theabove-mentioned Trusted Platform Module (TPM) circuits.

The mapper M_(j) is preferably an apparatus that is adapted for readingany input parameters of modular exponentiation (base, exponent,modulus), as well as executing the operation and making the resultavailable for readout. The mapper apparatus has to comprise a moduleadapted for random number generation. Such a module can for example beimplemented as a general-purpose computer or microcontroller. TPMcircuits also fulfil all the above listed requirements.

Another aspect of the invention is a computer program comprisinginstructions which, when the program is executed by a computer, causethe computer to carry out the steps of the method according to theinvention. The invention further relates to a computer-readable mediumadapted for storing the above-mentioned computer program.

The invention can be applied for various purposes; one of these beingthe analysis of loyalty card purchase databases involving multiplestores. Let us assume that a company engaged in business analysis andmarket research activities prepares an analysis of typical customerbehaviour in retail stores, which is then purchased by its clients. Theanalysis is aimed at defining customer groups based on characteristicslike the products purchased, the frequency of purchases, therelationship between type and location of stores, the season of year,and the products purchased, etc.

In order to prepare the analysis, the company needs data. In addition tousing publicly available statistical data, such companies often seek tomotivate retailer chains and individual stores into cooperating withthem. To facilitate that, they for example share part of their researchresults with the retailers so that they can improve the efficacy oftheir advertising and improve their selection of products. In many storepurchase transactions, none of the characteristics of the customer areknown. Although the data included in the receipt can be utilized, theonly extra information it provides compared to product sale statisticsis that it includes information on products sold during a singlepurchase transaction and the exact time and date thereof. At the sametime, the stores can also offer loyalty card programs. Customers areoffered various discounts for taking part in such programs. In the caseof such purchases, personal information on the customer and other datathereof relevant for analytic purposes are known. Such data have beenpassed on (in varying detail) to market research companies by some ofthe stores (data sources), however, due to a change in legislationrelated to protecting personal data, this practice will soon end. So,the most important product of the market research company, the “retailmarket report” has become jeopardized. The regulation on personal dataprotection makes the above business impossible, although analysing thebehaviour of customer groups does not require the possession of concretepersonal data of any of the customers.

If those pieces of data that are applicable for personal identificationare simply removed from the data passed on by the stores (except for,possibly, sex, age and postcode) then more valuable results can beobtained compared to those based on purchase receipts, but theinformation related to particular purchases of a given (anonymous)person at a given store is lost, although possessing and processing suchinformation is not legally prohibited. The stores have thereforecommitted to use a made-up identifier, i.e. a pseudonym for theidentification of the purchases of a given customer. This furtherimproves analysability, but this way a customer who made purchases indifferent stores will be treated as multiple different persons if themode of pseudonymization is not uniform.

The idea may arise that a mapping implemented utilizing a so-called“salted” cryptographic hash function can be applied to the personal data(such as name, sex, birth date, and postcode), but certain lawyersrepresenting the stores may reject this option because the resultinghash data can be connected, by the entity performing the data analysis,to the personal data simply by registering itself as a store andcompiling a rainbow table for example from the electoral register. Theinvention provides a solution to this problem. The implementation of thesolution according to the invention can comprise a server softwarecomponent that allows that the data sources DS_(i) generate and storetheir key on their own computer by visiting a web page (afterauthentication). The computations to performed by the mappers M_(j) andthe program supporting communication with the blockchain system can bewritten for a cloud environment. The service can be activated at variousdifferent cloud service providers such that its operation cannot beaffected (except for starting and stopping it) by any of the entities;this setup can preferably also be audited. Utilizing the client softwarebelonging to the web page, the distribution of the key factors is passedon by the stores to the mappers M_(j), the stores then uploading thedata to the blockchain (after encrypting them utilizing the key storedat them), where the pseudonym is generated as a result of the mappingsequence.

Thereby, the ciphers generated individually by the different stores aremapped into the same value by the entire computational chain. Also, theapplication of blockchain technology makes it impossible to compile arainbow table that would be applicable for restoring the relationshipbetween the unencrypted data and the pseudonym P.

Thus, the analyses can be applied for picking out customers whotypically make their purchases in a given store but usually buy aparticular product somewhere else, or on certain days do their shoppingat a different location shortly after store closure. These are valuablepieces of information that can support business decisions. For example,it is preferable to stock another brand of a particular product, or toclose an hour later on Fridays.

LIST OF REFERENCE SIGNS

-   D entity identifier-   P pseudonym-   DS_(i) data sources-   M_(j) mappers-   KM key manager-   d_(ij) elements (inverse key factors in a multiplicative structure)-   d_(ij) modular product of elements-   e_(i) cryptographic keys (in a multiplicative structure)-   a_(ij) elements (cryptographic key addends in an additive algebraic    structure)-   a_(i) sum-   b_(j) secret element (factor or addend of own mapping cryptographic    key)-   C_(i0) encrypted entity identifier-   h_(ij) mapping cryptographic keys-   DB database

1. A cryptographic pseudonym mapping method for an anonymous datasharing system, the method being adapted for generating a pseudonymiseddatabase (DB) from data relating to entities and originating from datasources (DS_(i)), wherein the data are identified at the data sources(DS_(i)) by entity identifiers (D) of the respective entities, andwherein the data are identified in the pseudonymised database (DB) bypseudonyms (P), characterised in that the pseudonyms (P) are assigned tothe respective entity identifiers (D) applying a one-to-one mapping,irrespective of the originating data source, by applying more than one,a number k of mappers (M_(j)), selecting for each data source (DS_(i))elements (d_(ij), a_(ij)) in a number equal to the number of mappers(M_(j)) from a predetermined algebraic structure constituting amultiplicative or an additive cyclic group, of which elements (d_(ij),a_(ij)) one element (d_(ij), a_(ij)) is sent, while being kept secretand kept assigned to the data source (DS_(i)), to each mapper (M_(j)),calculating from the plurality of elements (d_(ij), a_(ij)) an inversecryptographic key of said plurality, and transforming, each entityidentifier (D) to be mapped, into a respective encrypted entityidentifier (C_(i0)) by using the inverse cryptographic key as an ownsecret cryptographic key (e_(i)) of the data source (DS_(i)), generatingfor each mapper (M_(j)) its mapping cryptographic key (h_(ij))corresponding to the data source (DS_(i)) by using the element (d_(ij),a_(ij)) that was sent to the mapper (M_(j)) and an element (b_(j))selected randomly from the algebraic structure and kept secret by themapper (M_(j)) and generating respective pseudonyms (P) by sequentiallyperforming, in a permutation of the mappers (M_(j)) a number k ofmappings utilizing the mapping cryptographic keys (h_(ij)) of themappers (M_(j)) belonging to the particular data source (DS_(i)) on eachencrypted entity identifier (C_(i0)) encrypted by the data source(DS_(i)).
 2. The method according to claim 1, characterised by applyingan algebraic structure constituting a multiplicative cyclic group,wherein values are represented by residue classes modulo N, for whichalgebraic structure constants N=p·q and φ(N)=(p−1)·(q−1) arepredetermined, where p and q are randomly selected prime numbers, andφ(N) is the value of the Euler function obtained for N, for generatingthe own cryptographic key (e_(i)) of each data source (DS_(i)), randomlyselected factors that are relatively primes to the modulus φ(N) aregenerated as elements (d_(ij)) in a number corresponding to the numberof the mappers (M_(j)) the multiplicative inverse for φ(N) taken as amodulus of the modulo product (d_(ij)) of the randomly selected factorsis obtained in a manner known per se, and said multiplicative inversevalue is chosen as the own cryptographic key (e_(i)) of the data source(DS_(i)), for which value the formula e_(i)d_(ij)≡1 mod φ(N) holds true,the elements (d_(ij)) are sent encrypted to the mappers (M_(j)) themappers (M_(j)) are applied for decrypting their respective own elements(d_(ij)), and the mapping cryptographic key (h_(ij)) of each mapper(M_(j)) corresponding to the data source (DS_(i)) is generated applyingthe formula h_(ij)=d_(ij)b_(j) mod φ(N), where the randomly selectedsecret element (b_(j)) is relatively prime to φ(N), the encrypted entityidentifier(C_(i0))is computed by the data source (DS_(i)) utilizing theformula C_(i0)=D^(ei) mod N, and the sequential mappings of the mappers(M_(j)) are performed, from (s=0) to (s=k−1) applying the formulaC_(i,s+1) ^((j))=C_(i,s) ^(hij) mod N, where P=C_(ik),
 3. The methodaccording to claim 2, characterised in that the randomly selected primenumbers p and q can be represented utilizing half the number of bits ofa chosen key size.
 4. The method according to claim 2, characterised inthat the encrypted entity identifier (C_(i0)) is shared with the mappers(M_(j)) by writing it into a database that operates according to aprotocol verified by third parties and provides decentralizedauthenticity.
 5. The method according to claim 4, characterised in thata blockchain database is applied as the database providing decentralizedauthenticity.
 6. The method according to claim 2, characterised in thatthe constants N and φ(N) of the algebraic structure are generated by akey manager (KM), of which φ(N) is kept secret, and the owncryptographic key (e_(i)) of the data source (DS_(i)) and the elements(d_(ij)) corresponding thereto are generated by the key manager (KM) andare sent encrypted to the data source (DS_(i)) and to the mappers(M_(j)), wherein a prime number is chosen as randomly selected secretelement (b_(j)).
 7. The method according to claim 1, characterised byapplying an algebraic structure constituting an additive cyclic group,wherein values are represented by points of elliptic curves defined overa number field of residue classes modulo p, where ρ is a prime number,for which algebraic structure the following constants are predetermined:parameters A, B of the formula y²=x³+Ax+B mod p defining the points ofan elliptic curve defined over the residue classes of the prime numberp, and a point G of the curve that has an order q that is greater thanthe number of entity identifiers (D), for generating the owncryptographic key (e_(i)) of each data source (DS_(i)), elements(a_(ij)) are selected from the residue classes mod q as elements(d_(ij)) corresponding in number to the number of the mappers (M_(j)),and the sum (a_(i)) of the elements is chosen as the own cryptographickey (e_(i)) of the data source (DS_(i)), for which the formulae_(i)=a_(i)=Σ_(j=1) ^(k)a_(ij), the elements (a_(ij)) are sent encryptedto the mappers (M_(j)), the mappers (M_(j)) are applied for decryptingtheir respective own elements (d_(ij)), and the mapping cryptographickey (h_(ij)) of each mapper (M_(j)) corresponding to the data source(DS_(i)) is generated applying the formula h_(ij)=a_(ij)+b_(j), wherethe randomly selected secret element (b_(j)) is a value of the residueclasses mod q and is different from zero and one, the encrypted entityidentifier (C_(i0)) is computed by the data source (DS_(i)) utilizingthe formula C_(i0), =⊕a_(i)G, where operator ⊕ is the sum of the pointsof the elliptic curve, and the sequential mappings of the mappers(M_(j)) are performed, from (s=0) to (s=k−1) applying the formulaC_(i,s+1) ^((j))=C_(i,s)⊖h_(ij)G where A⊖B=A⊖(−B) and P=C_(ik).
 8. Themethod according to claim 7, characterised in that the encrypted entityidentifier (C_(i0)) is shared with the mappers (M_(j)) by writing itinto a database that operates according to a protocol verified by thirdparties and provides decentralized authenticity.
 9. The method accordingto claim 8, characterised in that a blockchain database is applied asthe database providing decentralized authenticity.
 10. The methodaccording to claim 7, characterised in that the constants of thealgebraic structure are generated by a key manager (KM), and the owncryptographic key (e_(i)) of the data source (DS_(i)) and the elements(a_(ij)) corresponding thereto are generated by the key manager (KM) andare sent encrypted to the data source (DS_(i)) and to the mappers(M_(j)).
 11. The method according to claim 1, characterised in that themappers (M_(j)) constitute a decentralized network.
 12. A computersystem for cryptographic pseudonymisation, the system comprising: datasources (DS_(i)) comprising data relating to entities, the data beingidentified at the data sources (DS_(i)) by entity identifiers (D) of theentities, and a pseudonymised database (DB), in which the data areidentified by pseudonyms (P), characterised in that the pseudonyms (P)are assigned to the respective entity identifiers (D) applying aone-to-one mapping, irrespective of the originating data source, and thesystem further comprises more than one, a number k of mappers (M_(j)), amodule adapted for selecting for each data source (DS_(i)) elements(d_(ij), a_(ij)) in a number of equal to the number of mappers (M_(j))from a predetermined algebraic structure constituting a multiplicativeor an additive cyclic group, a module adapted for sending to each mapper(M_(j)) one of the elements (d_(ij), a_(ij)), the sending module beingconfigured to send the element (d_(ij), a_(ij)) by keeping it secret andkeeping it assigned to the data source (DS_(i)), a module adapted forcalculating from the plurality of elements (d_(ij), a_(ij)) an inversecryptographic key of said plurality, a module for transforming eachentity identifier (D) to be mapped into a respective encrypted entityidentifier (C_(i0)) utilizing the inverse cryptographic key as the ownsecret cryptographic key (e_(i)) of the data source (DS_(i)), a moduleadapted for generating, for each mapper (M_(j)), a mapping cryptographickey (h_(ij)) thereof corresponding to the data source (DS_(i)) utilizingthe element (d_(ij), a_(ij)) that was sent to the mapper (M_(j)) and anelement (b_(j)) selected randomly from the algebraic structure and keptsecret by the mapper (M_(j)), and a module adapted for generating arespective pseudonym (P) for each encrypted entity identifier (C_(i0))encrypted by the data source (DS_(i)) by sequentially performing, in apermutation of the mappers (M_(j)), a number k of mappings utilizing themapping cryptographic keys (h_(ij)) of the mappers (M_(j)) correspondingto the particular data source (DS_(i)).
 13. The computer systemaccording to claim 12, characterised in that it comprises modulesadapted for performing a cryptographic pseudonym mapping method for ananonymous data sharing system, the method being adapted for generating apseudonymised database (DB) from data relating to entities andoriginating from data sources (DS_(i)), wherein the data are identifiedat the data sources (DS_(i)) by entity identifiers (D) of the respectiveentities, and wherein the data are identified in the pseudonymiseddatabase (DB) by pseudonyms (P), characterised in that the pseudonyms(P) are assigned to the respective entity identifiers (D) applying aone-to-one mapping, irrespective of the originating data source, byapplying more than one, a number k of mappers (M_(i)), selecting foreach data source (DS_(i)) elements (d_(ij), a_(ij)) in a number equal tothe number of mappers (M_(j)) from a predetermined algebraic structureconstituting a multiplicative or an additive cyclic group, of whichelements (d_(ij)a_(ij)) one element (d_(ij)a_(ij)) is sent, while beingkept secret and kept assigned to the data source (DS_(i)), to eachmapper (M_(j)), calculating from the plurality of elements (d_(ij),a_(ij)) an inverse cryptographic key of said plurality, andtransforming, each entity identifier (D) to be mapped, into a respectiveencrypted entity identifier C_(i0) b using the inverse cryptographic keyas an own secret cryptographic key (e_(i)) of the data source (DS_(i)),generating for each mapper (M_(j)) its mapping cryptographic key(h_(ij)) corresponding to the data source (DS_(i)) by using the element(d_(ij), a_(ij)) that was sent to the mapper (M_(j)) and an element(b_(j)) selected randomly from the algebraic structure and kept secretby the mapper (M_(j)), and generating respective pseudonyms (P) bysequentially performing, in a permutation of the mappers (M_(j)), anumber k of mappings utilizing the mapping cryptographic keys (h_(ij))of the mappers M_(j) belonging to the particular data source (DS_(i)) oneach encrypted entity identifier (C_(i0)) encrypted by the data source(DS_(i)).
 14. The computer system according to claim 13, characterisedin that, for sharing the encrypted entity identifier (C_(i0)) with themappers (M_(j)), it comprises a database that operates according to aprotocol verified by third parties and provides decentralizedauthenticity.
 15. The computer system according to claim 14,characterised in that a blockchain database is applied as the databaseproviding decentralized authenticity.
 16. The computer system accordingto claim 12, characterised in that it comprises a key manager (KM)adapted for generating the constants of the algebraic structure and theown cryptographic key (e_(i)) of the data source (DS_(i)) and theelements (d_(ij), a_(ij)) corresponding thereto.
 17. A computer programcomprising instructions which, when the program is executed by acomputer, cause the computer to carry out the steps of the methodaccording to claim
 1. 18. A computer-readable medium adapted for storingthe computer program according to claim 17.